Search Results: "rsl"

8 November 2011

Jonathan McDowell: The cost of progress

You should probably ignore this post. I'm just venting. I'll be better after a nice cup of tea. Things that are causing me to fume about the fact Gnome Shell just hit Debian/Testing: I update my testing boxes (work + home laptops) almost every day. It rarely breaks, and certainly when it does I accept that's what I get for doing rolling upgrades. I can't remember the last time I did an upgrade that actually made me angry. Also I suspect this thing is going to have a complete fit on my binary nVidia/hacked up DisplayLink configuration at work (the DisplayLink side refuses to do 3D for starters). Perhaps better not to upgrade there until I have a sufficient block of free time. Maybe it's time to go back to evilwm. I only stopped because I wanted a dock for wifi/bluetooth etc applets on my laptop that didn't get hidden when I fullscreened things. Implementing _NET_WM_STRUT might make that doable... (I'm sure some of this is just dealing with the change but it's a bit bloody difficult to deal with a complete change in user interface that hasn't even managed to carry across settings from the old one.)

24 July 2011

Tore S. Bekkedal: Ut ya

Oppdatering: Jeg har skrevet en engelsk versjon av dette som er mer gjennomtenkt. I have written an English version of this post which I have had more time to think through. Det var meningen f rst ha dette som en kladd, for senere g igjennom og skrive mer, men WordPress krangler og jeg orker ikke finne ut av det. Derfor publiserer jeg dette n . Jeg har lenge nsket sette opp en ny bloggmotor men har aldri kommet meg til gj re det og dette er definitivt ikke tidspunktet, s jeg gjenoppliver min gamle, engelskspr klige blogg. Andre har skrevet sine fortellinger om hva som skjedde p Ut ya. Jeg ville gjerne skrive ned min ogs , f den ut. Jeg vil dels skrive det ned fordi jeg ikke vet om jeg vil huske alle disse detaljene senere. P en m te h per jeg ikke det. Dels ogs for at jeg skal slippe m tte beskrive dette hele tiden for alle som sp r. Gro Harlem Brundtland hadde nylig forlatt leiren. Jeg hadde tatt opp en videohilsen fra henne, og var i kontoret til mediegruppen for redigere filen til noe vi kunne laste opp p YouTube. En av de som var i rommet skvatt til og fortalte at Twitter var fullt av meldinger om en eksplosjon i Oslo. Etterhvert som pressen kunne fortelle om hvor angrepet hadde skjedd, ble det mer og mer tydelig at et informasjonsm te var p sin plass. Det ble enighet om at det m tet skulle avholdes etter at bolken med innledninger var ferdig. Informasjonsm tet ble avholdt, og jeg tok p meg som den lokale nerd-in-chief sette opp en laptop som kunne vise NRK Nett-TV s folk kunne f sett hva som skjedde. Det tr dl se nettverket gikk nesten umiddelbart ned, s vi ble enige om at det skulle settes et passord p det. I p vente av at noen andre gjorde det, gikk jeg p do. Idet jeg satt der, h rte jeg f rst agitert roping, s skriking, s skudd. Det h rtes mer ut som en lekepistol enn noe annet og jeg antok at dette bare var en eller annen som kom med en totalt smakl s vits. Med dette i tankene, kom jeg n rmest stormende ut av b sen men idet jeg vrengte opp d ren s jeg to mennesker som signaliserte at jeg skulle komme meg inn igjen. Ansiktsuttrykkene deres levnet overhodet ingen tvil om at dette var alvor. At de var der reddet utvilsomt livet mitt. Jeg var mest av alt forvirret. Jeg antok der og da det var en AUFer som hadde gjort dette, g tt amok. Jeg tittet ut igjen. De to var der fremdeles. Denne gangen s jeg en person som l der p gulvet, i en blodp l. Jeg hadde yekontakt med ham, og han signaliserte tydelig etter hjelp. Jeg tenkte umiddelbart at de d rene ikke ville st imot noen som helst type kuler, s jeg tenkte kun p forflytte meg vekk fra b sen. Min f rste tanke var komme meg utend rs. Jeg l p ut korridoren, og akkurat da kom en fra kaf gjengen mot meg. Hun pnet ansatt-toalettet, og jeg, hun, og en annen kastet seg inn. Det er bare en av de mange tilfeldighetene som jeg i etterkant m innse at jeg antar reddet livet mitt. Vi satt der, i halvannen time. Klare til l pe, klare for egentlig hva som helst. Det ble til en eiendommelig dynamikk mellom oss tre, en slags perpleks galgenhumor. Jeg, ettersom det var helt realistisk at v r gruppe var den eneste som visste om likene i bygget, pr vde umiddelbart f varslet om det. 112 fungerte ikke. 110 fungerte ikke. Til slutt kom jeg igjennom p 113, og de kunne opplyse om at politiet var klar over situasjonen og var p vei. Det skulle ta halvannen time, og da vi ble evakuert l han der d d. Det ekte politiet ankom omsider. Vi gikk ut. Jeg tok veien rundt lillesalen, noe jeg i etterkant angrer p ; det jeg s der vil nok bli med meg en stund. Det l en haug med folk jeg husker ingen ansikter, bare et stort, amorft sammensurium av kropper i en fryktelig stor blodp l noen av dem, jeg mener huske to, var fremdeles bevisste. Vi ble f rst flyttet til Planet Ut ya-kontoret leiravisa hvor det var en jente som hadde blitt skutt. Vi rev opp noen poser med gensere fra lageret og dekket til henne. Der satt vi en stund jeg hadde forlengst mistet tidsbegrepet jeg mener det m ha v rt tolv av oss. En av ungdommene ble trass i protester fra de i gruppa som kjente og gikk god for ham p f rt h ndjern. Jeg forstod ikke helt hvorfor. Senere ble vi forklart at det var fordi han hadde kommet fra et omr de politiet ikke hadde oversikt over. Jeg s ikke n r de h ndjernene ble l snet, men jeg husker ihvertfall at det slo meg som uverdig behandling. Politimannen som hadde vakt p posten der vi var, var veldig dyktig til forklare oss situasjonen. Vi ble flyttet ut til hovedkorridoren, hvor omkring 50 av oss var. F rst idet jeg s og omfavnet de som antakeligvis reddet livet mitt, knakk jeg sammen i gr t. Jeg tok meg fort inn igjen, fikk skjelvingen under kontroll, og etter en del venting ble vi marsjert med hendene p hodet mot fergen. Jeg husker hvor redd jeg var for at noen skulle gli i den bratte, gj rmete bakken og skape en misforst else. Det var flere lik ute. Noen var tildekket p improvisert vis, andre l der bare. Alle jeg s viste mot, ro og beherskelse p et niv langt over det noen nsker forvente fra folk i de aldersgruppene. Det var et samhold og en m lbevissthet som gjorde et inntrykk. Vel over fjorden ble vi tilbudt tepper. Jeg ble spurt om jeg var skadet, og bedt om vise mageregionen. Vi ble loset ombord i bussen, og kj rt til Sundvolden hotell. F lelsen, den totalt utrolige lettelsen av se sine n rmeste venner igjen kan jeg virkelig ikke beskrive. Lettelsen ble temperert av usikkerheten rundt de man ikke s . Det dreier seg blant annet om mennesker som jeg i skrivende stund m anta har mistet livet. Blant dem mennesker jeg tidligere hadde hatt stor glede av vissheten om at kom til gj re fantastiske ting i landets og verdens tjeneste n er de revet vekk. Jeg vet ikke hvor mye mer enn denne litt n kterne rapport av hendelsene jeg klarer komme med akkurat n . Men slik var det ihvertfall jeg opplevde situasjonen jeg kom ut av takket v re en serie tilfeldigheter. Litt kort vil jeg legge til: Redningstjenestene var en stor hjelp. Vel s stor, om ikke st rre, har muligheten til tilbringe tid sammen med de andre overlevende v rt. Vi holder sammen og tr ster hverandre med v r felles erfaring. Ett lyspunkt ser jeg her: Mest av alt er jeg s utrolig glad for at han ikke var fremme tyve minutter tidligere, under informasjonsm tet. Da var hele Ut ya samlet som sild i t nne i hovedsalen, og med hans automatv pen ville d dstallene i l pet av noen titalls sekunder blitt det mangedobbelte av det vi endte opp med. At det er en veldig mager tr st for de etterlatte er jeg inderlig klar over. Mine aller dypeste tanker er med dem alle. Dette var et angrep p hele det norske demokratiet. I forvirrelsen f ler jeg en bitter trass. Vi skal vise hans likesinnede at det er sterkere enn som s . Jeg vil ikke la meg skremme til taushet og passivitet. Jeg nsker minnes de d de, og deretter hedre dem ved fortsette arbeidet de var en del av.

13 May 2011

Mike Hommey: Debian Squeeze + btrfs = FAIL

Executive summary: Don t use btrfs on Debian Squeeze.
Longer summary: Don t use btrfs RAID with the kernel Debian Squeeze comes with. About six months ago, I set up a new server to handle this web site, mail, and various other things. The system and most services (including web and mail) was set to use an MD RAID 1 array across two small partitions on two separate disks, and the remaining space was setup in three different btrfs file systems: Three days ago, this happened:
May 10 10:18:04 goemon kernel: [3545898.548311] ata4: hard resetting link
May 10 10:18:04 goemon kernel: [3545898.867556] ata4: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
May 10 10:18:04 goemon kernel: [3545898.874973] ata4.00: configured for UDMA/33
followed by other ATA related messages, then, garbage such as:
May 10 10:18:07 goemon kernel: [3545901.28123] sd3000 d]SneKy:AotdCmad[urn][ecitr
May 10 10:18:07 goemon kernel: 4[550.821 ecio es aawt es ecitr i e)
May 10 10:18:07 goemon kernel: 6[550.824     20 00 00 00 00 00 00 00 <>3491225     16 44 <>3491216]s ::::[d]Ad es:N diinlsneifrain<>3491216]s ::::[d]C:Ra(0:2 00 03 80 06 0
3491217]edrqet / ro,dvsb etr2272
May 10 10:18:07 goemon kernel: 3[550.837 ad:sb:rshdln etr2252
May 10 10:18:07 goemon kernel: 6[551214]s ::::[d]Rsl:hsbt=I_Kdiebt=RVRSNE<>3491215]s ::::[d]SneKy:AotdCmad[urn][ecitr
May 10 10:18:07 goemon kernel: 4[550.833 ecitrsnedt ihsnedsrpos(nhx:<>3491216]    7 b0 00 00 c0 a8 00 00 0
Then later on:
May 10 12:01:18 goemon kernel: [3552089.226147] lost page write due to I/O error on sdb4
May 10 12:01:18 goemon kernel: [3552089.226312] lost page write due to I/O error on sdb4
May 10 12:10:14 goemon kernel: [3552624.625669] btrfs no csum found for inode 23642 start 0
May 10 12:10:14 goemon kernel: [3552624.625783] btrfs no csum found for inode 23642 start 4096
May 10 12:10:14 goemon kernel: [3552624.625884] btrfs no csum found for inode 23642 start 8192
etc. and more garbage. At that point, I wanted to shutdown the server, check the hardware, and reboot. Shutdown didn t want to proceed completely. Btrfs just froze on the sync happening during the shutdown phase, so I had to power off violently. Nothing seemed really problematic on the hardware end, and after a reboot, both disks were properly working. The MD RAID would resynchronize, and the btrfs filesystems would be automatically mounted. It would work for a while, until such things could be seen in the logs, with more garbage as above in between:
May 10 14:41:18 goemon kernel: [ 1253.455545] __ratelimit: 35363 callbacks suppressed
May 10 14:45:04 goemon kernel: [ 1478.717749] parent transid verify failed on 358190825472 wanted 42547 found 42525
May 10 14:45:04 goemon kernel: [ 1478.717936] parent transid verify failed on 358316642304 wanted 42547 found 42515
May 10 14:45:04 goemon kernel: [ 1478.717939] parent transid verify failed on 358190825472 wanted 42547 found 42525
May 10 14:45:04 goemon kernel: [ 1478.718128] parent transid verify failed on 358316642304 wanted 42547 found 42515
May 10 14:45:04 goemon kernel: [ 1478.718131] parent transid verify failed on 358190825472 wanted 42547 found 42525
Then there would be kernel btrfs processes going on and on sucking CPU and I/O, doing whatever it was doing. At such moment, most file reading off one of the btrfs volumes would either take very long or freeze, and un-mounting would only freeze. At that point, considering the advantages of btrfs (in my case, mostly, snapshots) were outweighed by such issues (this wasn t my first btrfs fuck up, but by large, the most dreadful) and the fact that btrfs is just so slow compared to other filesystems, I decided I didn t want to care trying to save these filesystems from their agonizing death, and that I d just go with ext4 on MD RAID instead. Also, I didn t want to just try (with the possibility of going through similar pain) again with a more recent kernel. Fortunately, I had backups of most of the data (only problem being the time required to restore that amount of data), but for the few remaining things which, by force of bad timing, I didn t have a backup of, I needed to somehow get them back from these btrfs volumes. So I created new file systems to replace the btrfs volumes I could directly throw away and started recovering data from backups. I also, at the same time, tried to copy a big disk image from the remaining btrfs volume. Somehow, this worked, with the system load varying between 20 and 60 (with a lot of garbage in the logs and other services deeply impacted as well) But when trying to copy the remaining files I wanted to recover, things got worse, so I had to initiate a shutdown, and power cycle again. Since apparently the kernel wasn t going to be very helpful, the next step was to just get other things working, and get the data back some other way. What I did was to use a virtual machine to get the data off the remaining btrfs volume. The kernel could become unusable all it wanted to, I could just hard reboot without impacting the other services. In the virtual machine, things got interesting . I did try various things I ve seen on the linux-btrfs list, but nothing really did anything at all except spew some more parent transid messages. I should mention that the remaining btrfs volume was a RAID 0. To mount those, you d mount one of the constituting disks like this:
$ mount /dev/sdb /mnt
Except that it would complain that it can t find a valid whatever (I don t remember the exact term, and I threw the VM away already) so it wouldn t mount the volume. But when mounting the other constituting disk, it would just work. Well, that s kind of understandable, but what is not is that on the next boot (I had to reboot a lot, see below), it would error out on the disk that worked previously, and work on the disk that was failing before. So, here is how things went: Ain t that fun? The good thing is that in the end, despite the pain, I recovered all that needed to be recovered. I m in the process of recreating my build chroots from scratch, but that s not exactly difficult. It would just have taken a lot more time to recover them the same way, 50 files at a time. Side note: yes, I did try newer versions of btrfsck ; yes I did try newer kernels. No, nothing worked to make these btrfs volumes viable. No, I don t have an image of these completely fucked up volumes.

2 March 2011

Sandro Tosi: MySQL master/slave chain

Have you ever needed to create a MySQL databases replication chain like A->B->C where B is slave of A and master of C? Me neither, until yesterday.

Since it took us about an afternoon to make it works (along with our DBAs, so we're not alone ;)) let's share some knowledge.

A very brief recap of how MySQL replication works:

  1. slave I/O thread connects to the master, gets the new information from the binlog files and stores them in the relay log;
  2. slave SQL thread reads the relay log and applies the changes to the slave database, without changing the slave binlog files.
That said, B replicates correctly from A, but C is unable to replicate from B because B doesn't change its binlog files with updates coming from A, because there's no changes done directly on B.

In order to make the chain works, you need to add the parameter log-slave-updates on B configuration: that will reply the changes from relay log to binlog, and so C will see the changes it needs to correctly replicate.

PS: mysqldump --master-data (executed on the slave server against the master) would help you set up the correct information for replication.

27 December 2010

Petter Reinholdtsen: The many definitions of a open standard

One of the reasons I like the Digistan definition of "Free and Open Standard" is that this is a new term, and thus the meaning of the term has been decided by Digistan. The term "Open Standard" has become so misunderstood that it is no longer very useful when talking about standards. One end up discussing which definition is the best one and with such frame the only one gaining are the proponents of de-facto standards and proprietary solutions. But to give us an idea about the diversity of definitions of open standards, here are a few that I know about. This list is not complete, but can be a starting point for those that want to do a complete survey. More definitions are available on the wikipedia page. First off is my favourite, the definition from the European Interoperability Framework version 1.0. Really sad to notice that BSA and others has succeeded in getting it removed from version 2.0 of the framework by stacking the committee drafting the new version with their own people. Anyway, the definition is still available and it include the key properties needed to make sure everyone can use a specification on equal terms.
The following are the minimal characteristics that a specification and its attendant documents must have in order to be considered an open standard:
  • The standard is adopted and will be maintained by a not-for-profit organisation, and its ongoing development occurs on the basis of an open decision-making procedure available to all interested parties (consensus or majority decision etc.).
  • The standard has been published and the standard specification document is available either freely or at a nominal charge. It must be permissible to all to copy, distribute and use it for no fee or at a nominal fee.
  • The intellectual property - i.e. patents possibly present - of (parts of) the standard is made irrevocably available on a royalty- free basis.
  • There are no constraints on the re-use of the standard.
Another one originates from my friends over at DKUUG, who coined and gathered support for this definition in 2004. It even made it into the Danish parlament as their definition of a open standard. Another from a different part of the Danish government is available from the wikipedia page.
En ben standard opfylder f lgende krav:
  1. Veldokumenteret med den fuldst ndige specifikation offentligt tilg ngelig.
  2. Frit implementerbar uden konomiske, politiske eller juridiske begr nsninger p implementation og anvendelse.
  3. Standardiseret og vedligeholdt i et bent forum (en s kaldt "standardiseringsorganisation") via en ben proces.
Then there is the definition from Free Software Foundation Europe.
An Open Standard refers to a format or protocol that is
  1. subject to full public assessment and use without constraints in a manner equally available to all parties;
  2. without any components or extensions that have dependencies on formats or protocols that do not meet the definition of an Open Standard themselves;
  3. free from legal or technical clauses that limit its utilisation by any party or in any business model;
  4. managed and further developed independently of any single vendor in a process open to the equal participation of competitors and third parties;
  5. available in multiple complete implementations by competing vendors, or as a complete implementation equally available to all parties.
A long time ago, SUN Microsystems, now bought by Oracle, created its Open Standards Checklist with a fairly detailed description.
Creation and Management of an Open Standard
  • Its development and management process must be collaborative and democratic:
    • Participation must be accessible to all those who wish to participate and can meet fair and reasonable criteria imposed by the organization under which it is developed and managed.
    • The processes must be documented and, through a known method, can be changed through input from all participants.
    • The process must be based on formal and binding commitments for the disclosure and licensing of intellectual property rights.
    • Development and management should strive for consensus, and an appeals process must be clearly outlined.
    • The standard specification must be open to extensive public review at least once in its life-cycle, with comments duly discussed and acted upon, if required.
Use and Licensing of an Open Standard
  • The standard must describe an interface, not an implementation, and the industry must be capable of creating multiple, competing implementations to the interface described in the standard without undue or restrictive constraints. Interfaces include APIs, protocols, schemas, data formats and their encoding.
  • The standard must not contain any proprietary "hooks" that create a technical or economic barriers
  • Faithful implementations of the standard must interoperate. Interoperability means the ability of a computer program to communicate and exchange information with other computer programs and mutually to use the information which has been exchanged. This includes the ability to use, convert, or exchange file formats, protocols, schemas, interface information or conventions, so as to permit the computer program to work with other computer programs and users in all the ways in which they are intended to function.
  • It must be permissible for anyone to copy, distribute and read the standard for a nominal fee, or even no fee. If there is a fee, it must be low enough to not preclude widespread use.
  • It must be possible for anyone to obtain free (no royalties or fees; also known as "royalty free"), worldwide, non-exclusive and perpetual licenses to all essential patent claims to make, use and sell products based on the standard. The only exceptions are terminations per the reciprocity and defensive suspension terms outlined below. Essential patent claims include pending, unpublished patents, published patents, and patent applications. The license is only for the exact scope of the standard in question.
    • May be conditioned only on reciprocal licenses to any of licensees' patent claims essential to practice that standard (also known as a reciprocity clause)
    • May be terminated as to any licensee who sues the licensor or any other licensee for infringement of patent claims essential to practice that standard (also known as a "defensive suspension" clause)
    • The same licensing terms are available to every potential licensor
  • The licensing terms of an open standards must not preclude implementations of that standard under open source licensing terms or restricted licensing terms
It is said that one of the nice things about standards is that there are so many of them. As you can see, the same holds true for open standard definitions. Most of the definitions have a lot in common, and it is not really controversial what properties a open standard should have, but the diversity of definitions have made it possible for those that want to avoid a level marked field and real competition to downplay the significance of open standards. I hope we can turn this tide by focusing on the advantages of Free and Open Standards.

15 October 2010

Enrico Zini: Award winning code

Award winning code Me and Yuwei had a fun day at hhhmcr (#hhhmcr) and even managed to put together a prototype that won the first prize \o/ We played with the gmp24 dataset kindly extracted from Twitter by Michael Brunton-Spall of the Guardian into a convenient JSON dataset. The idea was to find ways of making it easier to look at the data and making sense of it. This is the story of what we did, including the code we wrote. The original dataset has several JSON files, so the first task was to put them all together:
#!/usr/bin/python
# Merge the JSON data
# (C) 2010 Enrico Zini <enrico@enricozini.org>
# License: WTFPL version 2 (http://sam.zoy.org/wtfpl/)
import simplejson
import os
res = []
for f in os.listdir("."):
    if not f.startswith("gmp24"): continue
    data = open(f).read().strip()
    if data == "[]": continue
    parsed = simplejson.loads(data)
    res.extend(parsed)
print simplejson.dumps(res)
The results however were not ordered by date, as GMP had to use several accounts to twit because Twitter was putting Greather Manchester Police into jail for generating too much traffic. There would be quite a bit to write about that, but let's stick to our work. Here is code to sort the JSON data by time:
#!/usr/bin/python
# Sort the JSON data
# (C) 2010 Enrico Zini <enrico@enricozini.org>
# License: WTFPL version 2 (http://sam.zoy.org/wtfpl/)
import simplejson
import sys
import datetime as dt
all_recs = simplejson.load(sys.stdin)
all_recs.sort(key=lambda x: dt.datetime.strptime(x["created_at"], "%a %b %d %H:%M:%S +0000 %Y"))
simplejson.dump(all_recs, sys.stdout)
I then wanted to play with Tf-idf for extracting the most important words of every tweet:
#!/usr/bin/python
# tfifd - Annotate JSON elements with Tf-idf extracted keywords
#
# Copyright (C) 2010  Enrico Zini <enrico@enricozini.org>
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program.  If not, see <http://www.gnu.org/licenses/>.
import sys, math
import simplejson
import re
# Read all the twits
records = simplejson.load(sys.stdin)
# All the twits by ID
byid = dict(((x["id"], x) for x in records))
# Stopwords we ignore
stopwords = set(["by", "it", "and", "of", "in", "a", "to"])
# Tokenising engine
re_num = re.compile(r"^\d+$")
re_word = re.compile(r"(\w+)")
def tokenise(tweet):
    "Extract tokens from a tweet"
    for tok in tweet["text"].split():
        tok = tok.strip().lower()
        if re_num.match(tok): continue
        mo = re_word.match(tok)
        if not mo: continue
        if mo.group(1) in stopwords: continue
        yield mo.group(1)
# Extract tokens from tweets
tokenised = dict(((x["id"], list(tokenise(x))) for x in records))
# Aggregate token counts
aggregated =  
for d in byid.iterkeys():
    for t in tokenised[d]:
        if t in aggregated:
            aggregated[t] += 1
        else:
            aggregated[t] = 1
def tfidf(doc, tok):
    "Compute TFIDF score of a token in a document"
    return doc.count(tok) * math.log(float(len(byid)) / aggregated[tok])
# Annotate tweets with keywords
res = []
for name, tweet in byid.iteritems():
    doc = tokenised[name]
    keywords = sorted(set(doc), key=lambda tok: tfidf(doc, tok), reverse=True)[:5]
    tweet["keywords"] = keywords
    res.append(tweet)
simplejson.dump(res, sys.stdout)
I thought this was producing a nice summary of every tweet but nobody was particularly interested, so we moved on to adding categories to tweet. Thanks to Yuwei who put together some useful keyword sets, we managed to annotate each tweet with a place name (i.e. "Stockport"), a social place name (i.e. "pub", "bank") and a social category (i.e. "man", "woman", "landlord"...) The code is simple; the biggest work in it was the dictionary of keywords:
#!/usr/bin/python
# categorise - Annotate JSON elements with categories
#
# Copyright (C) 2010  Enrico Zini <enrico@enricozini.org>
# Copyright (C) 2010  Yuwei Lin <yuwei@ylin.org>
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program.  If not, see <http://www.gnu.org/licenses/>.
import sys, math
import simplejson
import re
# Electoral wards from http://en.wikipedia.org/wiki/List_of_electoral_wards_in_Greater_Manchester
placenames = ["Altrincham", "Sale West",
"Altrincham", "Ashton upon Mersey", "Bowdon", "Broadheath", "Hale Barns", "Hale Central", "St Mary", "Timperley", "Village",
"Ashton-under-Lyne",
"Ashton Hurst", "Ashton St Michael", "Ashton Waterloo", "Droylsden East", "Droylsden West", "Failsworth East", "Failsworth West", "St Peter",
"Blackley", "Broughton",
"Broughton", "Charlestown", "Cheetham", "Crumpsall", "Harpurhey", "Higher Blackley", "Kersal",
"Bolton North East",
"Astley Bridge", "Bradshaw", "Breightmet", "Bromley Cross", "Crompton", "Halliwell", "Tonge with the Haulgh",
"Bolton South East",
"Farnworth", "Great Lever", "Harper Green", "Hulton", "Kearsley", "Little Lever", "Darcy Lever", "Rumworth",
"Bolton West",
"Atherton", "Heaton", "Lostock", "Horwich", "Blackrod", "Horwich North East", "Smithills", "Westhoughton North", "Chew Moor", "Westhoughton South",
"Bury North",
"Church", "East", "Elton", "Moorside", "North Manor", "Ramsbottom", "Redvales", "Tottington",
"Bury South",
"Besses", "Holyrood", "Pilkington Park", "Radcliffe East", "Radcliffe North", "Radcliffe West", "St Mary", "Sedgley", "Unsworth",
"Cheadle",
"Bramhall North", "Bramhall South", "Cheadle", "Gatley", "Cheadle Hulme North", "Cheadle Hulme South", "Heald Green", "Stepping Hill",
"Denton", "Reddish",
"Audenshaw", "Denton North East", "Denton South", "Denton West", "Dukinfield", "Reddish North", "Reddish South",
"Hazel Grove",
"Bredbury", "Woodley", "Bredbury Green", "Romiley", "Hazel Grove", "Marple North", "Marple South", "Offerton",
"Heywood", "Middleton",
"Bamford", "Castleton", "East Middleton", "Hopwood Hall", "Norden", "North Heywood", "North Middleton", "South Middleton", "West Heywood", "West Middleton",
"Leigh",
"Astley Mosley Common", "Atherleigh", "Golborne", "Lowton West", "Leigh East", "Leigh South", "Leigh West", "Lowton East", "Tyldesley",
"Makerfield",
"Abram", "Ashton", "Bryn", "Hindley", "Hindley Green", "Orrell", "Winstanley", "Worsley Mesnes",
"Manchester Central",
"Ancoats", "Clayton", "Ardwick", "Bradford", "City Centre", "Hulme", "Miles Platting", "Newton Heath", "Moss Side", "Moston",
"Manchester", "Gorton",
"Fallowfield", "Gorton North", "Gorton South", "Levenshulme", "Longsight", "Rusholme", "Whalley Range",
"Manchester", "Withington",
"Burnage", "Chorlton", "Chorlton Park", "Didsbury East", "Didsbury West", "Old Moat", "Withington",
"Oldham East", "Saddleworth",
"Alexandra", "Crompton", "Saddleworth North", "Saddleworth South", "Saddleworth West", "Lees", "St James", "St Mary", "Shaw", "Waterhead",
"Oldham West", "Royton",
"Chadderton Central", "Chadderton North", "Chadderton South", "Coldhurst", "Hollinwood", "Medlock Vale", "Royton North", "Royton South", "Werneth",
"Rochdale",
"Balderstone", "Kirkholt", "Central Rochdale", "Healey", "Kingsway", "Littleborough Lakeside", "Milkstone", "Deeplish", "Milnrow", "Newhey", "Smallbridge", "Firgrove", "Spotland", "Falinge", "Wardle", "West Littleborough",
"Salford", "Eccles",
"Claremont", "Eccles", "Irwell Riverside", "Langworthy", "Ordsall", "Pendlebury", "Swinton North", "Swinton South", "Weaste", "Seedley",
"Stalybridge", "Hyde",
"Dukinfield Stalybridge", "Hyde Godley", "Hyde Newton", "Hyde Werneth", "Longdendale", "Mossley", "Stalybridge North", "Stalybridge South",
"Stockport",
"Brinnington", "Central", "Davenport", "Cale Green", "Edgeley", "Cheadle Heath", "Heatons North", "Heatons South", "Manor",
"Stretford", "Urmston",
"Bucklow-St Martins", "Clifford", "Davyhulme East", "Davyhulme West", "Flixton", "Gorse Hill", "Longford", "Stretford", "Urmston",
"Wigan",
"Aspull New Springs Whelley", "Douglas", "Ince", "Pemberton", "Shevington with Lower Ground", "Standish with Langtree", "Wigan Central", "Wigan West",
"Worsley", "Eccles South",
"Barton", "Boothstown", "Ellenbrook", "Cadishead", "Irlam", "Little Hulton", "Walkden North", "Walkden South", "Winton", "Worsley",
"Wythenshawe", "Sale East",
"Baguley", "Brooklands", "Northenden", "Priory", "Sale Moor", "Sharston", "Woodhouse Park"]
# Manual coding from Yuwei
placenames.extend(["City centre", "Tameside", "Oldham", "Bury", "Bolton",
"Trafford", "Pendleton", "New Moston", "Denton", "Eccles", "Leigh", "Benchill",
"Prestwich", "Sale", "Kearsley", ])
placenames.extend(["Trafford", "Bolton", "Stockport", "Levenshulme", "Gorton",
"Tameside", "Blackley", "City centre", "Airport", "South Manchester",
"Rochdale", "Chorlton", "Uppermill", "Castleton", "Stalybridge", "Ashton",
"Chadderton", "Bury", "Ancoats", "Whalley Range", "West Yorkshire",
"Fallowfield", "New Moston", "Denton", "Stretford", "Eccles", "Pendleton",
"Leigh", "Altrincham", "Sale", "Prestwich", "Kearsley", "Hulme", "Withington",
"Moss Side", "Milnrow", "outskirt of Manchester City Centre", "Newton Heath",
"Wythenshawe", "Mancunian Way", "M60", "A6", "Droylesden", "M56", "Timperley",
"Higher Ince", "Clayton", "Higher Blackley", "Lowton", "Droylsden",
"Partington", "Cheetham Hill", "Benchill", "Longsight", "Didsbury",
"Westhoughton"])
# Social categories from Yuwei
soccat = ["man", "woman", "men", "women", "youth", "teenager", "elderly",
"patient", "taxi driver", "neighbour", "male", "tenant", "landlord", "child",
"children", "immigrant", "female", "workmen", "boy", "girl", "foster parents",
"next of kin"]
for i in range(100):
    soccat.append("%d-year-old" % i)
    soccat.append("%d-years-old" % i)
# Types of social locations from Yuwei
socloc = ["car park", "park", "pub", "club", "shop", "premises", "bus stop",
"property", "credit card", "supermarket", "garden", "phone box", "theatre",
"toilet", "building site", "Crown court", "hard shoulder", "telephone kiosk",
"hotel", "restaurant", "cafe", "petrol station", "bank", "school",
"university"]
extras =   "placename": placenames, "soccat": soccat, "socloc": socloc  
# Normalise keyword lists
for k, v in extras.iteritems():
    # Remove duplicates
    v = list(set(v))
    # Sort by length
    v.sort(key=lambda x:len(x), reverse=True)
# Add keywords
def add_categories(tweet):
    text = tweet["text"].lower()
    for field, categories in extras.iteritems():
        for cat in categories:
            if cat.lower() in text:
                tweet[field] = cat
                break
    return tweet
# Read all the twits
records = (add_categories(x) for x in simplejson.load(sys.stdin))
simplejson.dump(list(records), sys.stdout)
All these scripts form a nice processing chain: each script takes a list of JSON records, adds some bit and passes it on. In order to see what we have so far, here is a simple script to convert the JSON twits to CSV so they can be viewed in a spreadsheet:
#!/usr/bin/python
# Convert the JSON twits to CSV
# (C) 2010 Enrico Zini <enrico@enricozini.org>
# License: WTFPL version 2 (http://sam.zoy.org/wtfpl/)
import simplejson
import sys
import csv
rows = ["id", "created_at", "text", "keywords", "placename"]
writer = csv.writer(sys.stdout)
for rec in simplejson.load(sys.stdin):
    rec["keywords"] = " ".join(rec["keywords"])
    rec["placename"] = rec.get("placename", "")
    writer.writerow([rec[row] for row in rows])
At this point we were coming up with lots of questions: "were there more reports on women or men?", "which place had most incidents?", "what were the incidents involving animals?"... Time to bring Xapian into play. This script reads all the JSON tweets and builds a Xapian index with them:
#!/usr/bin/python
# toxapian - Index JSON tweets in Xapian
#
# Copyright (C) 2010  Enrico Zini <enrico@enricozini.org>
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program.  If not, see <http://www.gnu.org/licenses/>.
import simplejson
import sys
import os, os.path
import xapian
DBNAME = sys.argv[1]
db = xapian.WritableDatabase(DBNAME, xapian.DB_CREATE_OR_OPEN)
stemmer = xapian.Stem("english")
indexer = xapian.TermGenerator()
indexer.set_stemmer(stemmer)
indexer.set_database(db)
data = simplejson.load(sys.stdin)
for rec in data:
    doc = xapian.Document()
    doc.set_data(str(rec["id"]))
    indexer.set_document(doc)
    indexer.index_text_without_positions(rec["text"])
    # Index categories as categories
    if "placename" in rec:
        doc.add_boolean_term("XP" + rec["placename"].lower())
    if "soccat" in rec:
        doc.add_boolean_term("XS" + rec["soccat"].lower())
    if "socloc" in rec:
        doc.add_boolean_term("XL" + rec["socloc"].lower())
    db.add_document(doc)
db.flush()
# Also save the whole dataset so we know where to find it later if we want to
# show the details of an entry
simplejson.dump(data, open(os.path.join(DBNAME, "all.json"), "w"))
And this is a simple command line tool to query to the database:
#!/usr/bin/python
# xgrep - Command line tool to query the GMP24 tweet Xapian database
#
# Copyright (C) 2010  Enrico Zini <enrico@enricozini.org>
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program.  If not, see <http://www.gnu.org/licenses/>.
import simplejson
import sys
import os, os.path
import xapian
DBNAME = sys.argv[1]
db = xapian.Database(DBNAME)
stem = xapian.Stem("english")
qp = xapian.QueryParser()
qp.set_default_op(xapian.Query.OP_AND)
qp.set_database(db)
qp.set_stemmer(stem)
qp.set_stemming_strategy(xapian.QueryParser.STEM_SOME)
qp.add_boolean_prefix("place", "XP")
qp.add_boolean_prefix("soc", "XS")
qp.add_boolean_prefix("loc", "XL")
query = qp.parse_query(sys.argv[2],
    xapian.QueryParser.FLAG_BOOLEAN  
    xapian.QueryParser.FLAG_LOVEHATE  
    xapian.QueryParser.FLAG_BOOLEAN_ANY_CASE  
    xapian.QueryParser.FLAG_WILDCARD  
    xapian.QueryParser.FLAG_PURE_NOT  
    xapian.QueryParser.FLAG_SPELLING_CORRECTION  
    xapian.QueryParser.FLAG_AUTO_SYNONYMS)
enquire = xapian.Enquire(db)
enquire.set_query(query)
count = 40
matches = enquire.get_mset(0, count)
estimated = matches.get_matches_estimated()
print "%d/%d results" % (matches.size(), estimated)
data = dict((str(x["id"]), x) for x in simplejson.load(open(os.path.join(DBNAME, "all.json"))))
for m in matches:
    rec = data[m.document.get_data()]
    print rec["text"]
print "%d/%d results" % (matches.size(), matches.get_matches_estimated())
total = db.get_doccount()
estimated = matches.get_matches_estimated()
print "%d results over %d documents, %d%%" % (estimated, total, estimated * 100 / total)
Neat! Now that we have a proper index that supports all sort of cool things, like stemming, tag clouds, full text search with complex queries, lookup of similar documents, suggest keywords and so on, it was just fair to put together a web service to share it with other people at the event. It helped that I had already written similar code for apt-xapian-index and dde before. Here is the server, quickly built on bottle. The very last line starts the server and it is where you can configure the listening interface and port.
#!/usr/bin/python
# xserve - Make the GMP24 tweet Xapian database available on the web
#
# Copyright (C) 2010  Enrico Zini <enrico@enricozini.org>
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program.  If not, see <http://www.gnu.org/licenses/>.
import bottle
from bottle import route, post
from cStringIO import StringIO
import cPickle as pickle
import simplejson
import sys
import os, os.path
import xapian
import urllib
import math
bottle.debug(True)
DBNAME = sys.argv[1]
QUERYLOG = os.path.join(DBNAME, "queries.txt")
data = dict((str(x["id"]), x) for x in simplejson.load(open(os.path.join(DBNAME, "all.json"))))
prefixes =   "place": "XP", "soc": "XS", "loc": "XL"  
prefix_desc =   "place": "Place name", "soc": "Social category", "loc": "Social location"  
db = xapian.Database(DBNAME)
stem = xapian.Stem("english")
qp = xapian.QueryParser()
qp.set_default_op(xapian.Query.OP_AND)
qp.set_database(db)
qp.set_stemmer(stem)
qp.set_stemming_strategy(xapian.QueryParser.STEM_SOME)
for k, v in prefixes.iteritems():
    qp.add_boolean_prefix(k, v)
def make_query(qstring):
    return qp.parse_query(qstring,
        xapian.QueryParser.FLAG_BOOLEAN  
        xapian.QueryParser.FLAG_LOVEHATE  
        xapian.QueryParser.FLAG_BOOLEAN_ANY_CASE  
        xapian.QueryParser.FLAG_WILDCARD  
        xapian.QueryParser.FLAG_PURE_NOT  
        xapian.QueryParser.FLAG_SPELLING_CORRECTION  
        xapian.QueryParser.FLAG_AUTO_SYNONYMS)
@route("/")
def index():
    query = urllib.unquote_plus(bottle.request.GET.get("q", ""))
    out = StringIO()
    print >>out, '''
<html>
<head>
<title>Query</title>
<script src="http://ajax.googleapis.com/ajax/libs/jquery/1.4.2/jquery.min.js"></script>
<script type="text/javascript">
$(function() 
    $("#queryfield")[0].focus()
 )
</script>
</head>
<body>
<h1>Search</h1>
<form method="POST" action="/query">
Keywords: <input type="text" name="query" value="%s" id="queryfield">
<input type="submit">
<a href="http://xapian.org/docs/queryparser.html">Help</a>
</form>''' % query
    print >>out, '''
<p>Example: "car place:wigan"</p>

<p>Available prefixes:</p>

<ul>
'''
    for pfx in prefixes.keys():
        print >>out, "<li><a href='/catinfo/%s'>%s - %s</a></li>" % (pfx, pfx, prefix_desc[pfx])
    print >>out, '''
</ul>
'''
    oldqueries = []
    if os.path.exists(QUERYLOG):
        total = db.get_doccount()
        fd = open(QUERYLOG, "r")
        while True:
            try:
                q = pickle.load(fd)
            except EOFError:
                break
            oldqueries.append(q)
        fd.close()
        def print_query(q):
            count = q["count"]
            print >>out, "<li><a href='/query?query=%s'>%s (%d/%d %.2f%%)</a></li>" % (urllib.quote_plus(q["q"]), q["q"], count, total, count * 100.0 / total)
        print >>out, "<p>Last 10 queries:</p><ul>"
        for q in oldqueries[:-10:-1]:
            print_query(q)
        print >>out, "</ul>"
        # Remove duplicates
        oldqueries = dict(((x["q"], x) for x in oldqueries)).values()
        print >>out, "<table>"
        print >>out, "<tr><th>10 queries with most results</th><th>10 queries with least results</th></tr>"
        print >>out, "<tr><td>"
        print >>out, "<ul>"
        oldqueries.sort(key=lambda x:x["count"], reverse=True)
        for q in oldqueries[:10]:
            print_query(q)
        print >>out, "</ul>"
        print >>out, "</td><td>"
        print >>out, "<ul>"
        nonempty = [x for x in oldqueries if x["count"] > 0]
        nonempty.sort(key=lambda x:x["count"])
        for q in nonempty[:10]:
            print_query(q)
        print >>out, "</ul>"
        print >>out, "</td></tr>"
        print >>out, "</table>"
    print >>out, '''
</body>
</html>'''
    return out.getvalue()
@route("/query")
@route("/query/")
@post("/query")
@post("/query/")
def query():
    query = bottle.request.POST.get("query", bottle.request.GET.get("query", ""))
    enquire = xapian.Enquire(db)
    enquire.set_query(make_query(query))
    count = 40
    matches = enquire.get_mset(0, count)
    estimated = matches.get_matches_estimated()
    total = db.get_doccount()
    out = StringIO()
    print >>out, '''
<html>
<head><title>Results</title></head>
<body>
<h1>Results for "<b>%s</b>"</h1>
''' % query
    if estimated == 0:
        print >>out, "No results found."
    else:
        # Give as results the first 30 documents; also use them as the key
        # ones to use to compute relevant terms
        rset = xapian.RSet()
        for m in enquire.get_mset(0, 30):
            rset.add_document(m.document.get_docid())
        # Compute the tag cloud
        class NonTagFilter(xapian.ExpandDecider):
            def __call__(self, term):
                return not term[0].isupper() and not term[0].isdigit()
        cloud = []
        maxscore = None
        for res in enquire.get_eset(40, rset, NonTagFilter()):
            # Normalise the score in the interval [0, 1]
            weight = math.log(res.weight)
            if maxscore == None: maxscore = weight
            tag = res.term
            cloud.append([tag, float(weight) / maxscore])
        max_weight = cloud[0][1]
        min_weight = cloud[-1][1]
        cloud.sort(key=lambda x:x[0])
        def mklink(query, term):
            return "/query?query=%s" % urllib.quote_plus(query + " and " + term)
        print >>out, "<h2>Tag cloud</h2>"
        print >>out, "<blockquote>"
        for term, weight in cloud:
            size = 100 + 100.0 * (weight - min_weight) / (max_weight - min_weight)
            print >>out, "<a href='%s' style='font-size:%d%%; color:brown;'>%s</a>" % (mklink(query, term), size, term)
        print >>out, "</blockquote>"
        print >>out, "<h2>Results</h2>"
        print >>out, "<p><a href='/'>Search again</a></p>"
        print >>out, "<p>%d results over %d documents, %.2f%%</p>" % (estimated, total, estimated * 100.0 / total)
        print >>out, "<p>%d/%d results</p>" % (matches.size(), estimated)
        print >>out, "<ul>"
        for m in matches:
            rec = data[m.document.get_data()]
            print >>out, "<li><a href='/item/%s'>%s</a></li>" % (rec["id"], rec["text"])
        print >>out, "</ul>"
        fd = open(QUERYLOG, "a")
        qinfo = dict(q=query, count=estimated)
        pickle.dump(qinfo, fd)
        fd.close()
    print >>out, '''
<a href="/">Search again</a>

</body>
</html>'''
    return out.getvalue()
@route("/item/:id")
@route("/item/:id/")
def show(id):
    rec = data[id]
    out = StringIO()
    print >>out, '''
<html>
<head><title>Result %s</title></head>
<body>
<h1>Raw JSON record for twit %s</h1>
<pre>''' % (rec["id"], rec["id"])
    print >>out, simplejson.dumps(rec, indent=" ")
    print >>out, '''
</pre>
</body>
</html>'''
    return out.getvalue()
@route("/catinfo/:name")
@route("/catinfo/:name/")
def catinfo(name):
    prefix = prefixes[name]
    out = StringIO()
    print >>out, '''
<html>
<head><title>Values for %s</title></head>
<body>
''' % name
    terms = [(x.term[len(prefix):], db.get_termfreq(x.term)) for x in db.allterms(prefix)]
    terms.sort(key=lambda x:x[1], reverse=True)
    freq_min = terms[0][1]
    freq_max = terms[-1][1]
    def mklink(name, term):
        return "/query?query=%s" % urllib.quote_plus(name + ":" + term)
    # Build tag cloud
    print >>out, "<h1>Tag cloud</h1>"
    print >>out, "<blockquote>"
    for term, freq in sorted(terms[:20], key=lambda x:x[0]):
        size = 100 + 100.0 * (freq - freq_min) / (freq_max - freq_min)
        print >>out, "<a href='%s' style='font-size:%d%%; color:brown;'>%s</a>" % (mklink(name, term), size, term)
    print >>out, "</blockquote>"
    print >>out, "<h1>All terms</h1>"
    print >>out, "<table>"
    print >>out, "<tr><th>Occurrences</th><th>Name</th></tr>"
    for term, freq in terms:
        print >>out, "<tr><td>%d</td><td><a href='/query?query=%s'>%s</a></td></tr>" % (freq, urllib.quote_plus(name + ":" + term), term)
    print >>out, "</table>"
    print >>out, '''
</body>
</html>'''
    return out.getvalue()
# Change here for bind host and port
bottle.run(host="0.0.0.0", port=8024)
...and then we presented our work and ended up winning the contest. This was the story of how we wrote this set of award winning code.

11 September 2010

Tim Retout: Debian Perl talk

Today I went to HantsLUG at IBM Hursley. I delivered a talk on the Debian Perl team aimed at end users, which was well received - I got a head start by getting people in #debian-perl to review the slides beforehand, which was very helpful. I'm told there will be a video uploaded in a month or so. I also plugged SmoothWall Express on Debian to some new people, and there was interest. My most recent discovery is that I probably need to extend netcfg in the debian installer to allow configuring more than one network interface.

24 April 2010

Adam Rosi-Kessel: Green Lemonade Success

My most successful Green Lemonade to date:
Green Lemonade

Green Lemonade

Approximate recipe: Juice and consume. I have heard from green-juice skeptics before. It may be hopeless for some poor souls. But don t knock it until you ve tried it.

16 January 2010

Wouter Verhelst: ACCEPTED

My mutt said this last night:
 894   + Jan 15 Archive Adminis (0,4K) ipcfg_0.1_amd64.changes ACCEPTED
This obviously means that if you wish to use it, you no longer need to go through git; you can just add experimental to sources.list and run 'apt-get install ipcfg'. A few notes, though: And in case you wonder why the hell I went from 0.1 to 0.3:
ipcfg (0.2) experimental; urgency=low
  * Rebuild without .git directory. D'oh.
 -- Wouter Verhelst <wouter>  Tue, 12 Jan 2010 17:43:09 +0100
srsly

26 February 2009

Riku Voipio: The not-slug: SheevaPlug

popcon lists approximately 1000 arm/armel Debian installations. There is also listed 864 installations of nslu2-utils, letting us estimate that approximately 85% of debian/arm(el) installations are Linksys nslu2's. It is nicked sympathetically as "The Slug", which pretty accurately describes the performance of nslu2. Still, people have found absolutely amazing amount of ways to use their slugs. What would you do with something approximately 10x more powerful with same prize/size range?

Enter the Marvell SheevaPlug




What Slug SheevaPlug
CPU 266Mhz 1.2Ghz
Cache 32KB 32+256KB
Flash 8MB 512MB
MEM 32MB 512MB
Net 100Mb 1Gb


And that's not everything - SheevaPlug comes with SDIO slot and miniusb to be used as a serial console (and JTAG). No soldering needed for hacking.

Some more details on the LinuxDevices article.

For those of you who think that has one port too few of something, or don't like the wall-wart design, Other devices based on kirkwood SoC (which SheevaPlug is based on) are on the way from various ODM/OEM houses.

13 December 2008

Mike Hommey: Shared subtrees and per-process namespaces

Now we have seen what per-process namespaces and shared subtrees are and how to operate them, we can try to combine these two features. We ll be using our newns tool from this earlier post to create new namespaces. And for practical reasons, let s say you have a terminal with a 1$ prompt and a second terminal with a 2$ prompt (that will allow me to skip go to the second terminal phrases). In the kernel, per-process namespaces are a bit like bind mounts, such that shared subtrees work with namespaces like they do with bind mounts. As with standard bind mounts, the default shared subtree mode is private, which means mounts done on either namespaces will be private to the namespace. Only mount points active at the time of the new namespace creation will be in both namespaces:
1$ ./newns
1$ mount /dev/sda1 /mnt
1$ ls /mnt
config-2.6.26-1-amd64 grub initrd.img-2.6.26-1-amd64 System.map-2.6.26-1-amd64 vmlinuz-2.6.26-1-amd64
2$ ls /mnt
2$ mount /dev/sda1 /cdrom
2$ ls /cdrom
config-2.6.26-1-amd64 grub initrd.img-2.6.26-1-amd64 System.map-2.6.26-1-amd64 vmlinuz-2.6.26-1-amd64
1$ ls /cdrom
1$ umount /mnt
1$ exit
# Exit from newns
2$ umount /cdrom
shared mode allows both namespaces to share subsequent mounts. As with bind mounts, and for obvious reasons, you need to change the subtree mode before creating the new namespace:
1$ mount --make-rshared /
1$ ./newns
1$ mount --bind /usr /mnt
1$ ls /mnt
bin games include lib lib32 lib64 local sbin share src X11R6
2$ ls /mnt
bin games include lib lib32 lib64 local sbin share src X11R6
Like with shared bind mounts, the new mount point can be unmounted from either namespace:
2$ umount /mnt
1$ ls /mnt
1$ exit
# Exit from newns
A slave subtree will have mount points under its master (shared) subtrees propagated, while propagation won t happen in the other direction. Again, very much like bind mounts:
1$ mount --make-rshared / # For completeness, we already did that earlier
1$ ./newns
1$ mount --make-rslave /
1$ mount --bind /usr /mnt
1$ ls /mnt
bin games include lib lib32 lib64 local sbin share src X11R6
2$ ls /mnt
2$ mount --bind /usr /cdrom
2$ ls /cdrom
bin games include lib lib32 lib64 local sbin share src X11R6
1$ ls /cdrom
bin games include lib lib32 lib64 local sbin share src X11R6
1$ exit
# Exit from newns
2$ umount /cdrom
Contrary to shared and slave, unbindable doesn t add value when used in two different namespaces. This is only something that will have impact on the current namespace:
1$ mount --make-rshared / # For completeness, we already did that earlier
1$ ./newns
2$ mount --make-runbindable /
2$ mount --bind /usr /mnt
mount: wrong fs type, bad option, bad superblock on /usr,
missing codepage or helper program, or other error
In some cases useful info is found in syslog - try
dmesg tail or so
1$ mount --bind /usr /mnt
bin games include lib lib32 lib64 local sbin share src X11R6
1$ exit
# Exit from newns
1$ mount --make-rprivate # Set back to default
Now we ve seen what can be done with namespaces and shared subtrees, let s see what nice features can be implemented with both. As Russell revealed, pam-namespace is able to polyinstanciate some directories following rules set in /etc/security/namespace.conf (see namespace.conf(5)). The sad thing is that it apparently can t just create a new namespace without polyinstanciating a directory, which could be useful if you only want separate namespaces, but no polyinstanciated directory. Russell s recipe goes as follows: What this means is that in the newly created namespace, a user reading in /tmp will actually be reading in /tmp-inst/$USER without the user knowing. Also, if root mounts something on the parent namespace, it will be propagated (/ being shared) to the user namespace. This means that mounting USB storage, for example, will be propagated to the user namespace. This also means that something mounted from the user namespace will also be propagated to the parent namespace. In both cases, this only applies to mounts occuring outside of /tmp, for which mounts don t get propagated. Note that without setting /tmp as private, when PAM would be mounting /tmp/inst/$USER, it would propagate as well, setting /tmp to /tmp/inst/$USER for everyone. So setting /tmp as private is mandatory. There is, however, a flaw in Russell s recipe: If any of the user mounts something under a submount of /, under /var for example, if /var is mounted, it won t be mounted to a user that already had a session opened. For that to be possible, you have to use --make-rshared instead of --make-shared in Russell s recipe. Some nice setup that can be done with all these, is the following: Add the following to /etc/security/namespace.conf:
/tmp tmpfs tmpfs root
Add the following to /etc/pam.d/common-session:
session required pam_namespace.so
Until here, it is pretty much the same as Russell s, except /tmp is a tmpfs instead of a real directory in /tmp-inst/, which can have some advantages. It doesn t seem to be possible to give pam-namespace a size for the tmpfs, though. Add the following to /etc/rc.local:
mount --make-rshared /
mount --bind /tmp /tmp
mount --make-private /tmp
Here again, this is the same as Russell s except for the correction for --make-rshared as seen above. If /tmp is already a mount point (on my systems, it is a tmpfs), you can remove mount --bind /tmp /tmp from above. Add the following to /etc/security/namespace.init:
mount --make-rslave /
Now, this is where it gets interesting: we re setting the whole tree as slave in the user namespace which means that if a user mounts a file system anywhere, it will only be seen in his session. Which means the user can more safely mount encrypted volumes: they won t be available to other users (root can still go wandering in /dev/kmem, though). And you don t even need SE linux for that. The caveat is that if you open a root session with su, from the user session, you are still inside the user boundaries, and don t have access to the virtual filesystem as it is for init. And if you mount something as root then, while it will appear in the user namespace, it won t appear neither in other users namespace nor in init s, which can be a good thing in some cases, but a burden in others. It means you may need to setup a special user that won t get a new namespace in /etc/security/namespace.conf. An alternative model would be to only set the user s home as slave, which means anything mounted by the user in his home directory would stay private in his namespace, while anything mounted outside would be shared with other namespaces. For that, you may want to replace the lines we added to /etc/security/namespace.init with the following:
HOME=$(getent passwd $3 cut -d: -f6)
mount --bind "$HOME" "$HOME"
mount --make-rslave "$HOME"
Either way, the remaining problem is that a root session opened with su from the user session won t get access to the original /tmp. As we saw, there are various use cases for namespaces and shared subtrees. I ll follow-up again on these features soon, as I ll be using them in yet another way to achieve a very different purpose.

Mike Hommey: Shared subtrees

As reply to my previous post about per-process namespaces, Russell wrote about pam-namespace and shared subtrees. As he reports, pam-namespace (that I discovered by the occasion) can do what I was suggesting would be nice for pam-tmpdir. Anyways, I was actually already planning to write about shared subtrees and how they can be useful with namespaces. In this first post, I will introduce shared subtrees, while in a follow-up post I will introduce how they can usefully be used with namespaces. As a preliminary note, you should know that while etch’s kernel supports shared subtrees, etch’s mount doesn’t support the necessary options. Lenny’s mount is unfortunately uninstallable on etch due to its dependency on a newer libc. But you can find a smount utility’s source code in Documentation/filesystems/sharedsubtree.txt in the kernel source, where other explanations about the feature are available. So, you can either backport lenny’s util-linux to etch, or build smount to replicate what I will be doing below. If you are using smount, just replace
mount --make-type path
with
smount path type
As seen in my previous post, bind-mounting allows to attach a file hierarchy at some other place in the virtual file system.
mount --bind / /mnt
will make all the contents of / (/bin, /etc, …) available through /mnt (/mnt/bin, /mnt/etc, …). On the other hand, sub-mounts will be ignored in such a case. For instance, /usr is usually a different mount point from /. It means /mnt/usr will be empty (provided it is empty in the underlying physical filesystem), instead of containing the same as /usr:
$ mount --bind / /mnt
$ ls /mnt/usr
$ ls /usr
bin games include lib lib32 lib64 local sbin share src X11R6
$ umount /mnt
If you also want /usr to be bind-mounted, you must use --rbind instead of --bind:
$ mount --rbind / /mnt
$ ls /mnt/usr
bin games include lib lib32 lib64 local sbin share src X11R6
$ ls /usr
bin games include lib lib32 lib64 local sbin share src X11R6
Obviously, sub-sub-mounts will also be propagated. Since recursively bind-mounting will create a bunch of mount points, unmounting can become a hassle. You can use the following command to unmount everything:
$ awk '$2 ~ /^\/mnt/ print $2 ' /proc/mounts sort -r xargs umount
Now, this is where shared subtrees come in. After the bind mount has been done, if you mount something on either side of the bind mount, the new mount is not propagated. This is called private subtrees, and is the default. But before doing anything else, let’s setup our testing environment:
$ mkdir -p /a/a /b
$ touch /a/b /a/c
$ ls /a
a b c
$ ls /b
After, bind-mounting /a onto /b, let’s see what happens when mounting something under /a, then under /b:
$ mount --bind /a /b
$ ls /b
a b c
$ mount --bind /usr /a/a
$ ls /a/a
bin games include lib lib32 lib64 local sbin share src X11R6
$ ls /b/a
$ umount /a/a
$ mount --bind /usr /b/a
$ ls /a/a
$ ls /b/a
bin games include lib lib32 lib64 local sbin share src X11R6
$ umount /b/a
As I said earlier, these new mounts are not propagated. Note that I used bind-mounts as sub mounts, but it would work all the same with other kind of mounts (device, fuse, etc.). There are 2 other modes that allow to have some propagation: shared and slave. shared allows mounts on both ends to be shared. Note you need to set the mode before bind-mounting:
$ umount /b
$ mount --bind /a /a
# This is needed because /a is not initially a mount point and you can only apply subtree modes to mount points.
$ mount --make-shared /a
$ mount --bind /a /b
$ mount /dev/sda1 /a/a
$ ls /a/a
config-2.6.26-1-amd64 grub initrd.img-2.6.26-1-amd64 System.map-2.6.26-1-amd64 vmlinuz-2.6.26-1-amd64
$ ls /b/a
config-2.6.26-1-amd64 grub initrd.img-2.6.26-1-amd64 System.map-2.6.26-1-amd64 vmlinuz-2.6.26-1-amd64
$ umount /a/a
$ mount /dev/sda1 /b/a
$ ls /a/a
config-2.6.26-1-amd64 grub initrd.img-2.6.26-1-amd64 System.map-2.6.26-1-amd64 vmlinuz-2.6.26-1-amd64
$ ls /b/a
config-2.6.26-1-amd64 grub initrd.img-2.6.26-1-amd64 System.map-2.6.26-1-amd64 vmlinuz-2.6.26-1-amd64
$ umount /b/a
You can even mount on one end and unmount from the other:
$ mount /dev/sda1 /a/a
$ umount /b/a
$ ls /a/a
slave allows mounts on the “master” end (/a in our case) to propagate to the “slave” end (/b), but not the other way around. The “master” end need to be shared :
$ umount /b
$ mount --make-shared /a
# Only for completeness, we already set /a as shared earlier.
$ mount --bind /a /b
$ mount --make-slave /b
$ mount /dev/sda1 /a/a
$ ls /a/a
config-2.6.26-1-amd64 grub initrd.img-2.6.26-1-amd64 System.map-2.6.26-1-amd64 vmlinuz-2.6.26-1-amd64
$ ls /b/a
config-2.6.26-1-amd64 grub initrd.img-2.6.26-1-amd64 System.map-2.6.26-1-amd64 vmlinuz-2.6.26-1-amd64
$ umount /a/a
$ mount /dev/sda1 /b/a
$ ls /a/a
$ ls /b/a
config-2.6.26-1-amd64 grub initrd.img-2.6.26-1-amd64 System.map-2.6.26-1-amd64 vmlinuz-2.6.26-1-amd64
$ umount /b/a
There is a third mode, unbindable, that does something different:
$ umount /b
$ mount --make-unbindable /a
$ mount --bind /a /b
mount: wrong fs type, bad option, bad superblock on /a,
       missing codepage or helper program, or other error
       In some cases useful info is found in syslog - try
       dmesg tail or so
$ mount --bind /a/a /b
mount: wrong fs type, bad option, bad superblock on /a,
       missing codepage or helper program, or other error
       In some cases useful info is found in syslog - try
       dmesg tail or so
As you can see, it disallows bind mounting of /a and its subdirectories to some other place. Finally, to put back the default mode, you can use:
$ mount --make-private /a
Similarly to --bind, there is also a recursive version of each: rshared, rslave, runbindable and rprivate, to apply these to sub-mounts.

29 August 2008

Meike Reichle: Where's Meike?

As September is traditionally the "conference month" I'll be travelling all around Germany within the next weeks. So, inspired by Matthew, here's a short list of events I'll attend in September 2008. If anyone's up for a coffee, keysigning or something let me know!

1 July 2008

Axel Beckert: Conkeror in the Debian NEW queue

I already mentioned a few times in the blog that I’m working on a Debian package of the Conkeror web browser. And now, after a lot of fine-tuning (and I still further new ideas how to improve the package ;-) Conkeror is finally in the NEW queue and hopefully will hit unstable in a few days. (Update Thursday, 03-Jul-2008, 18:13 CEST: The package has been accepted by Jörg and should be included on most architectures in tonight’s updates.) Those who could hardly await it can fetch Conkeror .debs from http://noone.org/debian/. The conkeror package itself is a non-architecture specific package (but needs xulrunner-1.9 to be available), and its small C-written helper program spawn-process-helper is available as package conkeror-spawn-process-helper for i386, amd64, sparc, alpha, powerpc, kfreebsd-i386 and kfreebsd-amd64. There are no backported packages for Etch available, though, since I don’t know of anyone yet, who has successfully backported xulrunner-1.9 to Etch. Interestingly the interest in Conkeror seems to have risen in the Debian community independently of its Debian packaging. Luca Capello, who sponsored the upload of my Conkeror package, pointed me to two blog post on Planet Debian, written by people being fed up with Firefox 3 already and are looking for a more lean, but still Gecko based web browser: Decklin Foster is fed up with Firefox’ -eh- Iceweasel’s arrogance and MJ Ray is fed up with Firefox 3 and its SSL problems. Since my previously favourited Gecko based web browser Kazehakase never became really stable but instead became slow and leaking memory (and therefore not much better than Firefox 2), I can imagine that it’s no more an candidate for people seaking for a lean and fast web browser. Conkeror has some “strange” concepts of which the primary one is that it looks and feels like Emacs: Footnotes *) I just noticed that there is now also muttator, making Thunderbird look and behave like vim (and probably also mutt), too. Wonder into which e-mail client the Emacs community will convert Thunderbird. GNUS? RMAIL? VM? Wanderslust? What will it be called? Wunderbird? Thunderslust? (SCNRE ;-)

26 June 2008

Decklin Foster: The data are

So I always knew that my ThinkPad's display sucked. But now I have scientific proof! (Long-winded post about my job follows.) I was running some timing tests at work on the code for a psychology experiment. This involves quickly presenting a stimulus to the subject, then measuring reaction time and synchronizing with EEG readings. Nothing very complicated, but accurate timing is essential; you don't want another variable confounding your results. So I have this photocell device that sends a signal to a Mac through the EEG hardware when there is really actually physical light coming off of the screen, not just when I told the computer to display something. Now CRTs, of course, are actually scanning (with an electron gun) the display at 75 or 85 Hz or something, not constantly pumping out light to the whole screen. This means 12 or 13 (or whatever) ms between refreshes. So if you want something to appear for 100 ms, you have to fudge a little, and make it so that you get maybe 8 or 9 scans. You can synchronize your presentation to the refresh so this is feasible. But how long the subject really perceives it ultimately depends on how "flickery" the monitor is at a given refresh rate. LCDs, on the other hand, do not have this problem. They actually do produce light constantly -- their refresh rate is pretty much always only 60 Hz or less, but you get the same(ish) output over the entire cycle, instead of in blips that give the illusion of a constant image (in fact, as Matthew points out, they can continue to hold the same image for even longer periods between refreshes. Can't wait to try out that patch). And luckily for us, 6 cycles are in theory exactly 100 ms (in practice, it's close enough). Or so I thought. I could not, no matter how many times I banged my head against it, get consistent results between our laptop (also a ThinkPad) and any external display. For reasons I do not entirely understand (probably scheduling issues), on CRTs I was getting one extra cycle, so I had to reduce the time by one to compensate. This was done for published studies years before I arrived at the lab, so (for the sake of not introducing additional variables) there is an extremely large disincentive to go back and fix it the Right Way, whatever that is. Our laptop's internal display was producing results that were skewed like those of a CRT. I thought for a minute I was going nuts from testing all day, but it kept happening. Curious, I swapped out the lab's T60 for my faster T61 and (to rule anything else out) killed all my other processes and wrote a fresh PyEPL script that did nothing but repeatedly blink some text for 100 ms. Here is, on the external VGA port, a cheap old 15" Dell LCD I pulled from the server room: http://www.rupamsunyata.org/~decklin/blogfiles/20080626/vga-lcd.png (These images are from the Mac, which is still using System 9.something. I had to remember Command-Shift-3 to take a screenshot and use StuffIt to get them off of it. Srsly. I can't remember the last time I used StuffIt.) And here is a CRT, showing the refreshes: (The DIN3 line is the actual signal, and the others are integrated over some number of milliseconds to compensate for scanning and attempt to get a usable number. Of course, this is done in hardware, not synchronized to anything else.) http://www.rupamsunyata.org/~decklin/blogfiles/20080626/vga-crt.png Now, this is the one that blew my mind. The ThinkPad's internal LCD: http://www.rupamsunyata.org/~decklin/blogfiles/20080626/lvds-lcd.png You can see that 60 Hz is spaced much farther apart, and that there are 6 cycles as intended, but. It's acting like a CRT! On the refresh, it puts out enough light to trigger the photocell, but inbetween, nothing. I ought to be noticing flicker and/or getting headaches all the time, if this is truly the case. (I guess I did get glasses for and because of staring at computer displays all the time. Should update my hackergotchi.) I have to admit I do not understand what is going on here at all. But it seems like making any assumptions about a display is (as the voice in my head was telling me all along) a very bad idea. Perhaps someone can email me some clue. But at least I have some hard data to validate my subjective opinion of the ThinkPad's internal LCD. Anyway: I still recommend the ThinkPad, even if the display is awful. I have a nice external monitor connected through DVI to the dock at home, and I mainly use a laptop so I can get out of the house and go hack outside or in a coffeeshop (not ideal lighting). The build quality, keyboard/trackpoint, performance, and the fact that Debian (testing or unstable, anyway) Just Works are all higher up on the list for me. I do wish I could afford the new lighter one, though.

8 June 2008

Matthew Garrett

My local Sainsbury's has what's possibly the most infuriating in-store music setup I've encountered yet. The selection itself isn't really an issue (it's fairly inoffensive music that seems to stretch from the 60s up to contemporary stuff), but every minute or so the music fades out and is replaced with advertising. This was bad enough when it was just talking about Mars Balls (you really don't want to know), but now it's impossible for me to buy even a loaf of bread and several litres of Pimm's without three or four exhortations to consider buying a bottle of ros and drinking it on ice. It turns out that this is part of some promotion by Gallo to flog the stuff to professional young woman or some such market segmentation bollocks, although I suspect that the net effect is actually just going to be me, a knife and some stabbings.

Srslywtfetc.

23 March 2008

MJ Ray: Random recent threads from planet debian

8 March 2008

Rob Bradford: Fennel

So. I bought some fennel from Borough Market and couldn’t really decide what to do with it. Since I needed to make flapjack I thought i’d maximise the use of the oven and make a spicy roast vegetable soup. So here is the recipe: Take some carrots, peel them (despite them being the sort you don’t need to peel but they’ve since gone a bit ick in the fridge) and slice in half. Then chop fennel and two medium onions into the same size chunks. Place them in a deepish tray and drizzle with lots of olive oil. Then sprinkle over salt, sage and black pepper. Place in a lowish oven (~120) for half an hour or so. Shaking in the middle. Just before you are going to take the veg out of the oven melt around 50g of butter in a largeish saucepan and put the kettle on. Dump the veg into the butter and sweat for a little bit. Use some of the hot water from the kettle to scrub off any tasty bits from the roasting tray (not going to be much if it’s non-stick). Next, put two heaped teaspoons of boullion powder (or a stock cube or two) in with the vegetables and add water until they are reasonably covered. Add chilli flakes and some black pepper. Bring to the boil and then simmer for 10 minutes. Next is the messy part. Take off the heat and use a blender to blend until the texture is thick with the odd interesting bit. Put back onto the heat. Bring back to the boil and simmer for another 5 minutes. Season to taste, adding a bit of fresh parsley at the last minute.

24 February 2008

Wouter Verhelst: Talk done

I held a talk this morning about the Belgian electronic ID card in Debian (and derivates, like Ubuntu). Things were a bit tight there for a moment, because I overslept; but eventually, we did make it in time. The talk didn't go perfect. Because I overslept, I didn't have enough time to test; because I didn't test enough, I found out that the smartcard didn't work on the demo machine. Darn; it's pretty silly to hold a talk about how easy it is to use the electronic ID card on Debian, only to find out that I, myself, am unable to use it. Grmbl. Anyway, the talk worked out okay. Despite the early hour, a decent amount of people showed up—including Fabian Arrotin, who held a talk about a similar topic in the CentOS room last year. Now for breakfast. Really.

16 January 2008

loldebian - Can I has a RC bug?: Why be an average guy any longer

Next.

Previous.